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HINTS & TIPS 
Defining A Data Need 


This document provides a list of questions to be considered when thinking about obtaining data. The 


list is not exhaustive or prescriptive, merely a suggestion of issues to consider in order to specify 
your data need more completely using the seven dimensions of the ABS Data Quality Framework. 
This template can be used to help you make a judgement about the overall fitness for purpose of a 
particular data item, or collection of data items, in the context of a data need. 


Institutional Environment 


refers to the institutional 
and organisational factors 
which may have a 
significant influence on the 
effectiveness and 
credibility of the agency 
producing statistics. 


From what type of organisation is it acceptable to obtain 
data? e.g. Private, public, non-profit, Professional 
leader/expert in topic etc. 


Do we care how the data has been collected? e.g. Police 
report (So individual does not report on themselves), self 
assessment, personal interview, independent reviewer etc. 


Relevance 


refers to how well the 
statistical product or 
release meets the needs of 
users in terms of the 
concept(s) measured, and 
the population(s) 
represented. 


What population are we interested in? 


o If the data we are thinking of using has some exclusions 
from this population (e.g. the population may exclude 
people in hospitals or nursing homes, or prisons, or a 
particular age group, or people who don't have an 
Australian Business Number, or people who don't earn 
over a particular amount) is it still okay for our purposes? 


What geographic level of detail do we require? 

What reference period are we interested in the data for? 
What are the main data items / outputs we require? 
What sort of analysis do we wish to conduct on the data? 


o Do we require group/range or individual/continuous 
variables? 


°o Do we need grouped/range information for income (e.g. 
$0 - $5000), or actual income responses (e.g. $1230)? 


Timeliness 


refers to the delay between 
the reference period (to 
which the data pertains) 
and the date on which the 
data becomes available; 
and the delay between the 
advertised date and the 
date at which the data 
become available (i.e., the 
actual release date). 


When do we need the data? 
Will we want to get a time series of this information? 


What is an acceptable delay between the reference period 
and the availability of the data for our purposes? 


Accuracy 


refers to the degree to 
which the data correctly 
describes the phenomenon 
they were designed to 
measure. 


Do we have an acceptable level of quality that is required for 
the data we require? e.g. What is the relative standard error 
on an estimate that is acceptable? 


Do we want to include data that has been imputed*? 


° If so, what is the maximum percent that can be imputed 
for any given data item? 


Do we want to include data that has been edit adjusted** 
(e.g. changing responses by fixing typo, misunderstanding 
questions)? 


Do we want original data or seasonally adjusted or trend 
data? 


How big does the data source need to be? e.g. Acensus, a 
specified minimum sample/collection size etc. 


What is our minimum response rate required for a collection? 
Do we want raw data*** or weighted data? 


What is our minimum item non-response rate for a question? 


Are we interested in levels or the movement of levels? i.e. 
Do we care about the number being 10 or is it of more 
interest that the number has changed by a movement of +1 
since last month/quarter/year etcetera? 


Coherence 


refers to the internal 
consistency of a statistical 
collection, product or 
release, as well as its 
comparability with other 
sources of information, 
within a broad analytical 
framework and over time. 


Do we want to combine several sources of data? 


© If so, are the different sources comparable? For example 
in terms of definitions, scope, collection method etc? 


Do we need data that conforms to particular standards? e.g. 
International, Australian, health etc. 


Do we need to create a time series of information? 


© If so, is the data over time consistent for our purposes? 


Coherence Continued 


e.g. Data items the same, underlying methodology for 
collecting the data the same (e.g. question wording, 
sampling method, collection method etc.), real world 
effects / impacts on the data etc? 


Interpretability 


refers to the availability of 
information to help provide 
insight into the data. 


Do we want a copy of the questions asked? 


Do we require a list of definitions for the concepts 
measured? 


Do we need other explanatory material in order to decide on 
whether to use the data or not? 


Accessibility 


refers to the ease of 
access to data by users, 
including the ease with 
which the existence of 
information can be 
ascertained, as well as the 
suitability of the form or 
medium through which the 
information can be 
accessed. 


What level of detail do we need for this data? e.g. Local 
government area by sex by employment status 


What is our budget for purchasing data? 


What format(s) do we need the data in? e.g. Spreadsheet, 
csv file etc. 


Do we need a contact name in case we need to ask for more 
information about the data? 


Do we require a different breakdown to that which is 
published? e.g. We need single year ages, not the 5 year 
age groups published. 


If we have a special data request when do we need the data 
by? 


Would we consider obtaining average/mean data at finer 
levels if this alleviated possible disclosure issues? 


*Imputed: Assigning a value for data that is missing. 


**Edit Adjusted: Adjusting existing data to correct obvious errors (e.g. typos). 


***Raw data is data that has not been adjusted in any way. For example if there were 20 fully 
responding people to a survey, the Raw Data count of the number of respondents would be 20. 


For more information go to http://www.nss.gov.au/DataQuality or email 


inquiries@nss.gov.au. 


